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Introduction 



■ In Unit 12, you studied the visual sense as a means of receiving 
information. In this unit we widen the discussion to study the other 
human senses, particularly those of hearing and of touch. 

■ It is important to use other senses for many reasons: 

To make information accessible to as many people as possible, we 
must overcome the limited nature of the current generation of human- 
computer interfaces. 

Some researchers and products already exploit these other senses and 
modes of information representation in order to overcome what has 
become known as the ‘digital divide. 

In order that the richness of communication between humans and 
computers can be increased. 

In some situations, the visual sense is inappropriate or ineffective, 
perhaps because it is fully engaged elsewhere. In these cases the other 
sense(s) can provide extra information. 

Some information (such as music) is best and most naturally 
represented and communicated in non-visual forms. 
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Introduction 

This unit aims to examine different types of human communication 
and to assess to what extent these can be used to interact with 
computers. In particular you will study: 

The digital divide and approaches that are being used to overcome it. 
Speech recognition and speech synthesis which enable humans to 
interact with computers through the spoken word. 

Non-speech audio, used as a medium for our interaction with computers 
through sound effects and computer music. 

Handwriting recognition in which written words are converted into 
computer processable characters. 

Tangible and gesture computing in which computers recognize and 
respond to human gestures such as sign language, and the 
manipulation of physical objects in the environment. 

Ubiquitous computing: many of the ethical issues to do with information 
in the real world and our interactions with it come together in this fast- 
moving field of computing. 
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The digital divide 

This section aims to: 

Describe the concept of the digital divide; 

■ Show how this divide is being bridged by a 
novel and innovative device, the Simputer. 
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■ ■l The digital divide III 

■ The term digital divide is most commonly used to refer 
to the gap that exists between the information-rich 
developed world and the information-poor developing 
world. 

■ Different amount of information are available to people 
who have access to electronic information sources (the 
Internet) and to people who don’t have access to 
electronic information sources. 
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The digital divide 



Causes of the digital divide include: 

1 The cost of the computers and the lack of technical 
infrastructure that is required to support them. 

2 The need for sophisticated electricity distribution 
system. 

3 Many people in the developing world who need to make 
use of the IT applications are illiterate . 

Some languages do not have a written form and many 
languages are not well supported by existing computer 
applications and operating systems. 

4 Many people in developing countries may not be 
familiar with the English language in which so many of 
the applications they might need to use are presented. 
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The digital divide 

Bridging the digital divide: the Simputer 

■ Simputer stands for simple inexpensive 
multilingual people’s computer. 

■ The Simputer project has been led by a group of 
Indian scientists and engineers. 

■ Its aim is to bridge the digital divide by providing 
access to the power of the computer via simple, 
natural, user-friendly interfaces based on sight, 
touch and sound. 

The Simputer is similar in size to a hand-held computer but with more 
memory and a more powerful processor. 

■ It has a number of innovations to make it accessible to people who are 
currently unable to use conventional desktop computers, such as: 

The use of a variety of languages that are widely spoken in India. 

Icons that enable the device to be operated by the illiterate, and speech 
synthesis so that such people can have information read out to them. 
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III The digital divide III 

Information Mark-up Language 

■ The Simputer project is committed to using open rather than 
proprietary standards in order to keep down costs. 

■ Therefore Simputers are based on the Linux operating system 
and the primary interface is a browser that can render the 
Information Mark-up Language (IML). 

■ IML is a new XML application that has been developed by the 
project team. 

■ One of IML’s main roles is to specify how pages should be 
displayed on the Simputer and what text on a web page 
should be read out . 

■ The text can be turned into an artificial sounding but 
nevertheless understandable speech in Indian languages 
using the library of sounds stored on the computer. 
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Speech audio interfaces 

When it comes to representing information in sound , vocalizations, 
particularly speech, are the most natural means of doing so 
Speech becomes especially important: 

When providing interfaces for the illiterate and for those with poor 
literacy skills. 

To people with visual impairment, and in situations where it is difficult to 
type on a keyboard, or where the eye cannot read a computer display. 

The section aims to: 

Describe speech recognition techniques and the limitations of speech 
recognition; 

Describe different ways of synthesizing speech; 

Show how a new generation of household robots is being developed 
that are capable of using speech interaction. 
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Speech audio interfaces 



Speech recognition 

There are two uses for speech recognition systems: 

Dictation: translation of the spoken word into written text. 

Computer Control: control of the computer and software applications 
by speaking commands. 

■ The architecture of a typical voice recognition system is shown in Figure 



3 . 1 . 





Figure 3.1 A typical voice recognition system 
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Speech audio interfaces 

Stages involved in Automatic Speech Recognition (ASR) are: 

■ First, a microphone detects the sound waves spoken by a 
person and converts them into an analogue electrical signal. 

■ This analogue signal is then converted into a digital signal 
by an analogue/digital converter. 

■ Next, the digital signal is split into the sequence of signals 
that make up each word. 

This is done by looking for areas of low electrical signal, which 
indicate that the user has finished one word and is starting to 
speak the next word. 

■ These digital speech fragments are then recognized as 
individual words by comparison with stored entries in a 
database, often created by training the system to an 
individual’s voice. 
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Speech audio interfaces 

Simple speech recognition 

1. Isolated word recognizers: 

■ An example are the telephone answering systems that offer some 
form of speech recognition as an alternative to responding to key 
presses. 

■ These telephone answering systems employ a limited form of 
speech recognition, simply replacing numeric key presses with 
spoken numbers such as ‘one’, ‘two’, and so on. 

■ These speech recognition systems are called isolated word 
recognizers because they are designed to recognize individual 
words. 

■ Pauses at the beginning and end of each word make it easy to 
isolate words and then recognize them. 

Most isolated word recognizers will be designed to be used by 
anyone, i.e. they are speaker independent. 
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Speech audio interfaces 

Simple speech recognition 

2, Speaker-enrolment systems: 

■ Where the ASR software has been trained to recognize a 
single individual . 

3. Dictation software: 

■ Such as Dragon’s NaturallySpeaking and IBM’s ViaVoice. 

■ Intended to replace the need to type on a keyboard in order to 
enter text into a computer. 

■ With these products, you speak into a microphone, the ASR 
software attempts to recognize your speech and converts it 
into words that appear on the screen as text. 
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Speech audio interfaces 

■ Advanced speech recognition 

In contexts more general than numbers, there might be a large number 
of candidate words that match the word spoken by the user, due to 
background noise, lack of clarity on the part of the speaker, or the 
conversion process from sounds to electrical signals. 

■ The system is given large databases of words, language and grammar 
rules, information on the frequency with which words are used in the 
user’s language and probabilities that a certain word follows another 
word, in order to identify likely words from a range of possibilities. 

■ Example: I have said: “the dog is barking in the morning” 

The speech recognition system has identified the first 2 words (the, dog) 
The third word has possible candidates: “barged”, “barked”, “barred” 
and “boiled”. 

Rules in the system that barking has the highest probability against the 
other ones. 

“Barked” would then be chosen as the recognized word. 
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Speech audio interfaces 

■ Speech recognition is not speech understanding. 

The computer doesn’t have any understanding of the 
meaning of the words that it has recognized. 

■ Artificial intelligence researchers are attempting to 
construct computer systems that possess some 
understanding of the meaning of individual words. 

■ We refer to this knowledge as common sense: 
knowledge that we take for granted when determining 
the meaning of words. 






Speech audio interfaces 

Speech synthesis 

■ Speech synthesis involves the creation, by computer, of 
spoken words from text. 

It is the technology underlying text reader. 

■ The search for a machine capable of reproducing human 
speech began nearly two centuries before the invention of the 
electronic computer. 

■ Automata is one of the early inventions that were capable of 
sounding individual vowels and consonants. 

■ The three types (techniques) of speech synthesis are 

phonemes , diphones and model-based speech synthesis. 
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Speech audio interfaces 



1. Using phonemes to produce speech 

The individual sounds that are produced by humans are 
called phonemes. 

■ Each language has a number of phonemes (English uses 
about 45 phonemes while Chinese use about 2000 ones). 

■ Speech synthesis based on phonemes involves joining 
together the appropriate phonemes in order to construct 
words. 

■ The phonemes are stored as real speech fragments which 
are concatenated to produce words. 

For example, the word ‘cat’ would be constructed from the 
three phonemes ‘k’, ‘a’ and ‘t’ which would be joined 
together in that order. 
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Speech audio interfaces 

2. Using diphones, an alternative to phonemes 

■ Speech synthesis that uses diphones focuses on the 
changes in sound that consecutive phonemes make 
when they are joined together, similar to the way that we 
join up letters to make words. 

A diphone spans (stretches) the middle of one phoneme 
to the middle of the next. 

■ As before, speech synthesis takes place by joining 
together diphones in the correct sequence. 

■ Several speech synthesis programs have been 
constructed using diphones, which produce much 
smoother speech than the phoneme approach. 
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Speech audio interfaces 

3. Model-based speech synthesis 

■ This is the most advanced form of speech synthesis and is 
based on detailed scientific studies of human speech. 

■ The method relies on modeling the way in which humans 
speak. 

■ It simulates the human vocal tract (produces the sound and 
then shapes it in order to speak) 

■ The term ‘model-based speech synthesis’ refers to the fact 
that the computer has a model, in software, of the way in 
which the human vocal tract works. 

Since it is simulating the human vocal tract, it does require 
more computing power than the other methods but this is 
now possible with a personal computer 



Speech audio interfaces 



Some problems facing speech synthesis systems are: 

1 . How to pronounce a word of text. 

The phonetic elements (which make up a description of how each word is 
pronounced and are used in dictionaries) can be used to give a guide to the 
pronunciation. 

Computers can be programmed to convert text into its equivalent sequence of 
phonetic elements, which are then passed to the speech synthesizer. 

2. Ambiguous words which are spelt identically but have different 
pronunciations, called homographs , can create problems and 
confusion. 



Consider the sequence of letters ‘ough’. Compare the pronunciation of the 
words ‘through’, ‘though’ and ‘enough’. 

3. The computer has no understanding of what it is reading, so it can 
make no attempt at inferring the correct pronunciation while 
speaking the text. 

For example, if we see the number 109, we recognize that it is a number and say 
‘one hundred and nine’, rather than ‘one, zero, nine’. 
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Speech audio interfaces 

As you’ve seen, a range of speech synthesis systems have been 
developed that are capable of producing reasonable quality speech. 
As yet, none of them approaches the range and fluency of human 
speech, although the quality of speech synthesis systems is 
increasing all the time. 

■ The quality of speech synthesis depends on: 

The method used (whole words, syllables or phonemes or even 
parts of phonemes, and whether the method uses rules about 
inflection based on context and punctuation, or doesn’t apply 
such rules); 

The number of bits used to represent each part of the sound that 
makes up the speech (the more used, the larger the file but the 
better the reproduction); 

The quality and range of the equipment used with the computer 
system (e.g. loudspeakers) and the computer processor’s speed. 
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Speech audio interfaces 



Talking to robots 

Today a range of sophisticated robots are being 
developed as household companions. Examples are: 
AIBO (Artificial Intelligence roBOt) 

The Sony Corporation of Japan was the first company to 
market a home robot. 

Sony’s AIBO is a small dog-like robot and is equipped with 
a range of sensors, including touch sensors that can 
detect when it is being stroked, microphones to pick up 
sounds and a video camera that provides very 
rudimentary vision. 




Figure 3.4 AIBO 



PaPeRo (Partner-type Personal Robot) : 

Another Japanese company, NEC, unveiled its household 
robot in 2001. 

PaPeRo is a very small, brightly colored autonomous 
robot designed to live in the home. 

PaPeRo communicates with its users through speech 
recognition and speech synthesis and is capable of 
responding to more than 3,000 phrases. 




Figure 3.5 PaPeRo 
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Speech audio interfaces 

Talking to robots 
■ TAMA. 

Matsushita has demonstrated a household robot, Tama, that is 
superficially a simple robot, resembling a child’s teddy bear. 

It is intended to be used by elderly people, particularly those showing 
signs of memory loss. Tama can be used as a personal reminder 
system 

Tama uses voice recognition to receive instructions from its owner and 
speech synthesis to generate its response. 

Tama is equipped with remote medical sensors to monitor its owner’s 
heartbeat and blood pressure. 

These figures are then transmitted wirelessly to a doctor or nurse, 
removing the need for regular medical check-ups or intrusive home 
visits. 

Likewise, if time passes and Tama does not receive any interaction from 
its owner it can raise an alarm and call medical services for assistance. 
Systems like Tama allow people to retain their independence for longer. 





Non-speech sound 

This section aims to: 

Describe the different types of sound; 

■ Show how computing and digital 
techniques can be applied to music; 

■ Show how sound effects in computer 
interfaces can be used. 
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Non-speech sound 

Different types of sound 
Music: 

Although production of music does not require the use of a 
computer, computer-based systems for music composition can help 
those who do not read or write music to compose and notate their 
compositions. 

In addition, music is increasingly being used as a background for 
information presentation on the web and as an accompaniment to 
computer games. 

Here, music is used to evoke particular moods or feelings in the user, 
rather than conveying information that is important to the use of the 
application. 

Computers can also be used to play music in CD players and MP3 
players, for example. 
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Non-speech sound 

Different types of sound 
Alerts: 

Sound can be used in computing (and even simple 
mechanical) systems for alerts. 

Microwave ovens indicate when they have finished cooking by 
issuing periodic beeps, as do some washing machines when 
they have completed their washing cycle. 

■ These sounds are generated by the embedded computer 
that controls the appliance. 

■ The sounds are designed to attract attention, perhaps to 
alert the user that they need to do something. 
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Non-speech sound 

Different types of sound 

Warnings: 

■ Sound effects are very commonly used in situations where it 
is necessary to draw an operator’s attention to something: 
warning or confirmation sounds for a pilot in an aircraft’s 
cockpit, attention-getting sounds in the control room of a 
power station, and so on. 

Noise: 

■ Unwanted sounds that appear at many different frequencies 
and amplitudes, usually it is interpreted as carrying no useful 
information. 

■ Noise can interfere with hearing if it predominates or is too 
loud; therefore, in general, we are interested in its reduction. 
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Non-speech sound 



Music 



Digital technologies are becoming an increasingly 
important part of music technology. 

■ Computers are taking over in the capture, storage and 
reproduction of music: 

The most important reason is that music stored in digital form 
can be copied without any loss of quality. 

This is not the case with analogue forms of music storage (on 
vinyl discs and magnetic tape)where, even with the best 
equipment, a copy will not be as good as the original. 

Analogue sounds must be converted into a digital representation 
by sampling . 

The sounds may then be converted back into an analogue signal 
for reproduction through loudspeakers. 
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Non-speech sound 



Digital recording and reproduction 

■ Digital recording techniques have transformed music 
recording and reproduction. 

■ The two commonly used formats are CD and MP3 formats. 

■ Key differences between CD and MP3 formats are: 



MP3 format 


CD format 


Has various sampling 
frequencies 


Have a fixed sampling 
frequency 


Produces smaller file size 
which enables easy and 
rapid transmission over the 
internet. 


Produces bigger file size 


Produces lower 
fidelity/accuracy outcome 


Produces higher 
fidelity/accuracy outcome 
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Non-speech sound 

Manipulating digital music 

Computer system enables the manipulation of digital 
recordings: 

It enables the recording technician to easily join parts of different 
performances in the studio to achieve a more perfect extended 
recording. 

The musician can repeat part of a piece of music in order to get the best 
possible performance of that part, and all the best parts can be joined 
together. 

Unwanted noises such as coughs can be removed from a 
recording. 

Correct old recordings. 
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Non-speech sound 

Musical Instrument Digital Interface 

For over 20 years, the Musical Instrument Digital 
Interface (MIDI) standard has been in common use 
among musicians. 

■ It enables the connection of a musical instrument, 
usually an electronic keyboard resembling a piano 
keyboard, to a computer. 

■ A MIDI interface with appropriate software enables a 
composer to compose at the keyboard without having to 
pause to write down the music. 

MIDI is not a digital recording of the sounds. Instead, 
MIDI records the pitch and length of notes as they were 
played, in order that the sounds can be re-created later. 



Non-speech sound 

Advantages of the MIDI standard: 

1 By separating the note from the sound, a tune 
in MIDI form can be replayed using different 
instruments. 

2. MIDI files tend to be much smaller than the 
equivalent digital recording when stored in CD 
or MP3 format. 

Because MIDI files do not store the actual sounds, 
only descriptions of how to make them. 
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Non-speech sound 

The differences between the audio CD and MP3 formats and the MIDI 
format are summarized in Table 4.1 



CD/MP3 audio format 


MIDI format 


How it works 




The CD/MP3 format is a digital 
recording of a music performance. 
The sound signal is digitised before 
it is stored in the CD/MP3 format. 


MIDI contains instructions that 
instruments such as electronic 
keyboards can interpret in order to 
play individual notes. 


Strengths of the format 




It can store any type of sounds 
including the human voice and 
natural sounds. 


Widely used in music performance. 


Widely used in the music industry. 


A very compact format. 


CD and MP3 players are very 
commonly available as electronic 
consumer items. 


Defines an interface standard for 
connecting electronic instruments. 




The file can be edited and 
individual notes changed. 




A piece of music can be 
orchestrated for different 
instruments. 
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Non-speech sound 

Digital composition 

Music composition means capturing and processing the notes. 

■ There are essentially two ways in which computers can be used 
in music composition: 

1 . To use a computer program to input musical scores using direct 
annotation of notes. Music composition programs, such as 
Sibelius, use the keyboard and mouse to input notes in much 
the same way as words are entered into a word processor. 

2 To use MIDI input. Here, synthesizers which output the MIDI 
format can be used to capture music as it is played on the 
synthesizer. The music can then be edited and stored on the 
computer, using appropriate software, and replayed through the 
MIDI interface so that composers can hear their compositions. 
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Non-speech sound 

Using sound effects in computer interfaces 

After sight, hearing is our richest human sense. 

■ The use of sound in the user interface is increasing for many reasons: 

1. Our visual and auditory senses are interdependent: they work well 
together. 

For example, with our eyes, we can see in front of us, but not behind; with our ears we can 
hear sounds coming from behind us. 

2. Sound reduces the load on the user’s visual system and reduces the 
amount of information that must be presented visually. 

3. Sound reduces the visual attention that must be paid to a device. 

For example, mobile phones use audible rather than visual alerts because visual 
information may be missed if the user is not looking at the device. 

4. Sound is attention grabbing. Visual signs and alerts can be ignored by 
turning away from them, but it is much harder to ignore an audible alert. 

5. Sound helps to make computers more usable by people with visual 
impairment. 








Non-speech sound 

Using sound effects in computer interfaces 

The use of non-speech sound is beneficial in systems for 
people with visual impairment and in mobile computing devices. 

In mobiles, sound has two principal advantages: it does not take up 
any screen space and users can hear the sound even if they are not 
able to look at the device. 

■ Researchers working on the use of non-speech sound in human- 
computer interfaces have coined the term ‘ earcons ’ 

■ Earcons : nonverbal audio messages that are used in the 
human-computer interface to provide information to the user 
about some computer object, operation or interaction. (Blattner et 
al., 1989). 

■ Earcons are based on musical sounds and they have been 
shown to be an effective means of communicating information. 
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Handwriting recognition 

So far, this unit has concentrated on one particular 
sense - hearing - and the ways in which sound can be 
used to represent information and to facilitate 
communication between humans and computers. 

■ In this section, you will study: 

The challenge that individual differences in handwriting 
poses to automated analysis; 

Some of the approaches that can be taken to simplify the 
task of handwriting recognition; 

Some personal digital assistants (PDAs) which use 
handwriting recognition, and some of the handwriting 
recognition systems associated with these devices. 



Handwriting recognition 

Writing systems 

Most computers are programmed only to respond to the 
movements and clicks of a mouse, or keystrokes on a 
keyboard. 

When you compare this with the richness of 
interaction between humans or how we interact with 
the outside world (e.g. driving a car, painting a 
picture), it is clear that the range of methods used to 
interact with computers is small. 

Keyboard is a device that contains a number of keys 
arranged in a random way called QWERTY keyboard ( 
the same as early mechanical design of manual 
typewriters). 
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Handwriting recognition 



For us who use Latin alphabets (26 letters, 10 numbers 
and few characters) this keyboard is perfect 

However, this keyboard is not suitable for many other 
alphabets like Japanese which is made up of many 
thousands of characters. 



■ An alternative to the keyboard is needed. 

An obvious solution is to use handwriting to input 
information into the computer. 

■ Handwriting is a skill which is used widely in non- 
computing situations. 

If that skill could be brought to the computer, people 
would be able to use the power of the new technology 
with little or no additional training. 
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Handwriting recognition 

Handwriting recognition 

One of the markets for computers that has only become 
viable in the last few years is the hand-held or pocket 
computer . 

■ As computers have become smaller, new problems have 
emerged specifically with the keyboard. 

Because keys become smaller, and are positioned more closely 
together, touch typing becomes increasingly difficult. 

This affects the usability of the keyboard. 

■ Handwriting recognition via a touch-sensitive screen is, 
then, an appealing alternative. 
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Handwriting recognition 

Handwriting recognition is difficult for computers to perform: 

The wide diversity of writing systems that are in use throughout the 
world, such as those based on the Latin, Arabic, or Cyrillic 
alphabets. 

There are differences in the directions in which the different writing systems are 
written and read (left to right for Latin, right to left for Arabic, and bottom to top 
for Chinese, for example). 

2 There are large individual differences in writing style. We write our 
characters with different shapes; place differing amounts of stress 
on the strokes comprising each character, and even use different 
numbers of strokes in the characters. 

3 Human beings are extremely good at resolving ambiguity in 
characters or words that are difficult to read or don’t make sense, 
using the context in which they appear to help. Programming 
knowledge of context and common-sense knowledge into a 
computer is extremely difficult. 




Handwriting recognition 



Simplifying the task 



■ Simplifying the task of handwriting recognition involves 
constraining the range of potential inputs that the computer 
might have to recognize, such as: 

1 Restricting the range of symbols that can be used, to just the 
upper case letters, numbers and a few punctuation marks. 

2 Requiring that characters are written in predefined boxes, 
(such as those that appear on many paper forms that we fill 
in today). 

3 Accepting handwritten characters that are not joined up 
(Cursive handwriting is not accepted). 



4. 




Redesigning the interface so that it is very clear what input is 
required (simple ticks and crosses, or other forms of 
shorthand rather than handwritten text). 
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Handwriting recognition 



Neural networks and their use in handwriting recognition 



Handwriting recognition has become one of the cornerstones of 
artificial intelligence (Al) : the study of the possibility of building 
machines which can reproduce the thought process of human brains. 
Many handwriting systems use so-called neural networks (or neural 
nets) , a technology based on the workings of the human brain. 



■ Neural networks are made up of interconnecting 
artificial neurons or nodes (programming 
constructs that mimic the properties of biological 
neurons) and consists of algorithms and 
procedures which attempt to simulate the 
connectionist computational architecture of the 
human brain on a computer system. 

Although neural nets are extremely powerful tools, 
each neuron is very simple. However, the rules 
regulating the learning process can be extremely 
complex task requiring powerful computers. 



A simple neural network 

npjl hiddm output 

ar/er k-yef layer 




Handwriting recognition 

■ Neural nets must be trained before they can become useful. 

■ This is done by presenting the neural net with known data and 
recording its response. If the network produces the correct 
answer it moves on to the next example. If the network 
produces an incorrect answer, the software adjusts the 
relative importance of the links inside the network and the test 
is repeated. 

■ Through this training process, the neural net creates 
equivalents of the chains of links found in the human brain. 

■ Just like the human brain, neural nets can be very good at: 

Distinguishing patterns in noisy information. 

Distinguishing a particular face out of hundreds (used in facial 
recognition security systems) 

Distinguishing trends in complicated flows of information such as those 
of the stock-market. 
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Handwriting recognition 

Good example of performing handwriting 
recognition with neural networks is Newton 
MessaqePad released by apple in 1993. 

It uses a powerful neural net software to interpret 
handwriting. 

The Newton had several novel features: 

You could use cursive handwriting to interact with the 
computer. 

The Newton would learn to recognize your 
handwriting; there was no need to change your style 
to suit the computer. 

■ The user interacted with the computer by writing 
on the display using a stylus. 

When user enters a word the system attempt to match it with the words 
from its internal dictionary, if it is found then it recognize the word. Else 
the user can tell the Newton to add the word to its internal dictionary. 

As time went, Newton become more and more accurate. 




Figure 5.5 The Newton MessagePad 



== I «! ■ !■ I ■ ij " 1 1! mm c: \ zi n\»Z\ 



SJ 



Handwriting recognition 

■ Why is it hard to use handwriting recognition on pocket 
PCs? 

Large amount of memory and processing power greatly reduces 
the battery life of a pocket PC or Palm (PDA). 

Palm computing wanted to produce a pocket computer 
with lower price and a battery life of weeks or even 
months by using a slow microprocessor and small 
memory. 

■ Palm decided to simplify the way of entering letters as 
this would simplify the task of recognition. 

Palm’s solution was to use glyphs . 
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Handwriting recognition 



■ Glyphs - highly stylized equivalents to letters, numbers and 
common punctuation characters (It is how a character looks). 

■ Most glyphs can be completed in a single stroke of the stylus 
and each is sufficiently different from all the others to make 
the recognition process tractable, even on a relatively slow 
microprocessor. Figure 5.7 illustrates this. 
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Figure 5.7 The Graffiti alphabet 
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Handwriting recognition 

■ Palm named their system Graffiti™ and launched it as part of 
the Palm Pilot hand-held computer, a machine that when 
compared to the Newton appeared underpowered and 
primitive. 

■ The Palm Pilot lacked the easy handwriting recognition of the 
Newton and required people to learn the Graffiti system 
before they could use much of the machine’s functionality. 

■ It turned out that the Palm Pilot was more attuned 
(appropriate) to the requirements of the marketplace than the 
Newton. 

■ The limitations of Graffiti™ were not so evident when relatively 
little information was being entered into the machine. 

Trading off sophistication for battery life and a small size 
turned Palm into one of the boom companies of the dot.com 
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Tangible computing and gesture computing 

■ In this section, you will study two related means of 
communicating and interacting with computers using our 
sense of touch : 

Tangible computing , which involves devices that can be 
used to interact with representations of information in the 
digital world; 

Gesture computing , where computers are programmed to 
interpret human gestures and movements. 

■ This area of human-computer interaction is sometimes 
known as haptic computing . 

■ Haptic computing : refers to that branch of computing 
which studies interfaces that senses the body’s 
movement and interprets these movements as input to 
the computer. 



Tangible computing and gesture computing 

In contrast to the visual and auditory senses, which are 
primarily used for communication of output - from the 
computer to the user - the sense of touch can be used bi- 
directionally. 

A haptic device can sense the body’s movements and can 
also give feedback to the user. 

This feedback may be tactile (the feel of the device), or kin 
aesthetic (relating to the position of the body or limbs). 

Haptic input involves physical contact between the 
computer and the user. Often this is via the hands using a 
keyboard, mouse or joystick, but it may involve other parts 
of the body such as the feet. 
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Tangible computing and gesture computing 

Tangible computing 

Tangible interface is an interface that gives a physical form to 
digital information. 

■ A physical object can be both a representation of digital 
information and a controller for such information. 

■ Examples of tangible user interface are: 

1 Personal Digital Assistant (PDA) that has extra controls and 
sensors added to it so that physically manipulating the PDA 
controls the display of information on the PDA screen. 

Thus the physical device itself becomes the interface through sensor 
technologies embedded in it. 

For example, scrolling through sequential lists on a PDA such as an 
address book can be implemented simply by tilting the PDA forwards or 
backwards, to effect scrolling forwards or backwards through the list. 

- To stop scrolling the list, the user ceases to tilt the PDA, and to select a 
particular item squeezes the PDA. 





Tangible computing and gesture computing 

2. Devices that can provide a feedback through the 
sensation of resistance to movement. 

■ Force-feedback devices are computer peripheral 
devices such as steering wheels and joysticks that 
give tactile feedback to the user, often through 
increasing the resistance to motion as the device is 
operated. 

■ These devices are commonly used in driving and 
flying simulators in order to increase the realism of 
the simulation. 
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Tangible computing and gesture computing 

3 Other application domains for tangible user interfaces include 
information manipulation and visualization. 

■ In information visualization, physical objects (such as data 
gloves) are used as manipulable ‘containers’ for digital media. 

■ Data glove is an instrument that 
detects the join angels of the 
fingers and thumbs, and the 
position of the hand in 3D space. 

■ Data gloves are often used in 
virtual reality application, where 
they can be used by the wearer 
to manipulate virtual objects. Fi 9 ure6 - 4 Daugioves 
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Tangible computing and gesture computing 

Gesture recognition 

Gesture recognition is another communication mechanism that is 
particularly appropriate when it is necessary to use a computer without 
using a keyboard or screen, or when the user is unable to hear and 
often, as a consequence, to speak 

■ In such mechanism, human gestures such as sign language are 
interpreted as input to a computer system, (and then converted it into 
speech). 

■ The most commonly used sign language for communication is probably 
American Sign Language (ASL). 

■ ASL is based on whole word representation rather than related 
communication systems like finger-spelling. It currently consists of 
about 6000 gestures for common words. 

Research groups at the Massachusetts Institute of Technology (MIT) 
and the University of Pennsylvania have made separate projects for 
interpreting ASL. 
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Tangible computing and gesture computing 

The major technical challenge with interpreting human gestures 
such as sign language is tracking them since they are made free- 
form, in the air, primarily by the hands. 

Two approaches have been tried to address the problem: 

1. The first is for the person making the sign language gestures to 
wear special gloves, which makes it easier for an image- 
recognition system to track the hands against a general 
background. This is a vision-based solution. 

2. The second approach is for the signer to wear special sensors 
(usually based upon electromagnetic fields) which allows the 
computer to track the position of the hands in three dimensions. 
This approach does not use computer vision. Instead, it monitors 
changes to the electromagnetic field caused by the gestures made 
by the signer. 



" | « 1 ■ ;■ r mlM ] 1111" 1 1! Si c: ! = 3 — | .7 | 






Tangible computing and gesture computing 

An increasingly important area of human-computer 
interaction involves the interpretation of movements 
made by the human eye, usually called eve tracking . 

■ Eye-tracking systems are used to study how people use 
desktop computer applications through following which 
part of the screen the user is looking at and for how long. 

■ Another application area is as an assistive device for 
people who have limited or no ability to move their 
hands. 
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Ubiquitous computing 



Ubiquitous computing is a term coined by Mark Weiser , who defined 
it as: Making many computers available throughout the physical 
environment, while making them effectively invisible to the user. 

We may be unaware of the microcomputer that is timing our microwave 
oven or operating the engine management system in our car. 

■ In order to achieve Weiser’s vision of computing, the challenge of 
developing more natural human-computer interfaces is required. 

Such natural interfaces will not be like the desktop computer interfaces 
that we are used to; they may be more like the interfaces and 
interactions that we currently have with other objects in our physical 
environment. 



Humans speak to, gesture at, touch and write for other humans in order to 
communicate with them. We are beginning to be able to do this with 
computers through advances in speech recognition and speech synthesis, 
in handwriting recognition, and in tangible and gesture computing. 



Computers could be seen as an adjunct to the human body, for example, 
through intelligent clothing. In other words, computers may become 
wearable. 
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Ubiquitous computing 



Several key issues are implicit within ‘ubiquitous computing’: 

■ There are many computers in the environment, often serving a 
specialized purpose. 

■ The computers will be embedded within the physical environment 
and other equipment. 

■ The computers will be small and will not look like conventional 
computers. 

■ The computers will be invisible in the sense that people will not 
be aware that they are using a computer, not that they won’t be 
able to see it. This will require significant development in the 
ways in which we interact with computers and in the 
development of more natural interfaces to computers. 

Many of the computers will be networked and will communicate 
with each other. 
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Unit Summary 



■ In this unit, you’ve studied: 

The digital divide, the causes of the digital divide and one 
innovative hand-held computer (the Simputer) that is intended to 
bridge it. 

Speech recognition and speech synthesis using a computer. 

How different types of sound (specifically music, sound alerts 
and warnings) are used in human interactions with computers. 
How music can be stored and manipulated by computers, 
particularly using the MIDI interface. 

Some of the challenges which must be overcome in order to 
develop handwriting recognition software for a personal 
computer. 

Some of the hand-held computers that use handwriting 
recognition and the details of the handwriting recognition 
systems that they use. 

Several examples of novel types of computer interfaces that 
involve the interpretation of human movement. 

The concept of ubiquitous computing. 
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